Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1

نویسندگان

  • Marco Pagni
  • Hélène Niculita-Hirzel
  • Loïc Pellissier
  • Anne Dubuis
  • Ioannis Xenarios
  • Antoine Guisan
  • Ian R. Sanders
  • Jérôme Goudet
  • Nicolas Guex
چکیده

MOTIVATION Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT [email protected] or [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

PCR-restriction enzyme method based on the polymorphism in ribosomal-DNA for identification of the most important dermatophyte species in Iran

Background and aim: Dermatophytosis (tinea, ringworm) is the infection of skin, hair or nail that is caused by various keratinophilic fungi (dermatophytes). Dermatophytosis is a common infection throughout the world including all parts of Iran. As conventional laboratory procedures for identification of different dermatophytes are slow or lack specificity, more rapid and reliable methods are st...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2013